Exploring the Space of Black-box Attacks on Deep Neural Networks

نویسندگان

Arjun Nitin Bhagoji

Warren He

Bo Li

Dawn Xiaodong Song

چکیده

Existing black-box attacks on deep neural networks (DNNs) so far have largely focused on transferability, where an adversarial instance generated for a locally trained model can “transfer” to attack other learning models. In this paper, we propose novel Gradient Estimation black-box attacks for adversaries with query access to the target model’s class probabilities, which do not rely on transferability. We also propose strategies to decouple the number of queries required to generate each adversarial sample from the dimensionality of the input. An iterative variant of our attack achieves close to 100% adversarial success rates for both targeted and untargeted attacks on DNNs. We carry out extensive experiments for a thorough comparative evaluation of black-box attacks and show that the proposed Gradient Estimation attacks outperform all transferability based black-box attacks we tested on both MNIST and CIFAR-10 datasets, achieving adversarial success rates similar to well known, state-of-the-art white-box attacks. We also apply the Gradient Estimation attacks successfully against a real-world Content Moderation classifier hosted by Clarifai. Furthermore, we evaluate black-box attacks against state-of-theart defenses. We show that the Gradient Estimation attacks are very effective even against these defenses.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hardening Deep Neural Networks via Adversarial Model Cascades

Deep neural networks (DNNs) have been shown to be vulnerable to adversarial examples malicious inputs which are crafted by the adversary to induce the trained model to produce erroneous outputs. This vulnerability has inspired a lot of research on how to secure neural networks against these kinds of attacks. Although existing techniques increase the robustness of the models against whitebox att...

متن کامل

Boosting Adversarial Attacks with Momentum

Deep neural networks are vulnerable to adversarial examples, which poses security concerns on these algorithms due to the potentially severe consequences. Adversarial attacks serve as an important surrogate to evaluate the robustness of deep learning models before they are deployed. However, most of the existing adversarial attacks can only fool a black-box model with a low success rate because...

متن کامل

Simple Black-Box Adversarial Perturbations for Deep Networks

Deep neural networks are powerful and popular learning models that achieve state-of-the-art pattern recognition performance on many computer vision, speech, and language processing tasks. However, these networks have also been shown susceptible to carefully crafted adversarial perturbations which force misclassification of the inputs. Adversarial examples enable adversaries to subvert the expec...

متن کامل

Divide, Denoise, and Defend against Adversarial Attacks

Deep neural networks, although shown to be a successful class of machine learning algorithms, are known to be extremely unstable to adversarial perturbations. Improving the robustness of neural networks against these attacks is important, especially for security-critical applications. To defend against such attacks, we propose dividing the input image into multiple patches, denoising each patch...

متن کامل

Adversarial Phenomenon in the Eyes of Bayesian Deep Learning

Deep Learning models are vulnerable to adversarial examples, i.e. images obtained via deliberate imperceptible perturbations, such that the model misclassifies them with high confidence. However, class confidence by itself is an incomplete picture of uncertainty. We therefore use principled Bayesian methods to capture model uncertainty in prediction for observing adversarial misclassification. ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1712.09491 شماره

صفحات -

تاریخ انتشار 2017

Exploring the Space of Black-box Attacks on Deep Neural Networks

نویسندگان

چکیده

منابع مشابه

Hardening Deep Neural Networks via Adversarial Model Cascades

Boosting Adversarial Attacks with Momentum

Simple Black-Box Adversarial Perturbations for Deep Networks

Divide, Denoise, and Defend against Adversarial Attacks

Adversarial Phenomenon in the Eyes of Bayesian Deep Learning

عنوان ژورنال:

اشتراک گذاری